Reinforcement Learning is a newer branch of Data Science focused on allowing the machine to learn from itself, just as a dog would when learning how to respond to basic commands. Since this is a more advanced topic, this tutotial will be VERY high level. This tutorial will cover the following learning objectives:
Reinforcement Learning Overview
Model-free RL vs. Model-based RL
Reinforcement Learning Overview
Summary
In Reinforcement Learning, the agent interacts with an environment and is controlled by a policy that states what it can and can't do. The agent performs actions based on a reward. The user (you) changes the policy based on the agent's state.
A common real-world example of Reinforcement Learning is training a dog to sit. Typically, you'll first give the command (tell the dog "Sit!") and get its initial reaction to set a baseline. You then use a reward system, such as a treat or toy, to reinforce good behavior and prevent bad behavior. In this example, the agent is the dog, the environment is your home, or wherever you're training the dog, the reward is the treat or toy, the policy is the command you're giving, the action is the dog's response to the policy, and the state is the dog's response to the policy in the environment. However, the state can change based on the current, state. For example, if the dog is already sitting, what will it do when you tell it to sit?
A Value Function is a probability measure that states, given the current environment and present states, what is the probability of the policy returning a positive action? The Discount Rate is a variable in the function that determines the likelihood of the action occurring based on previous situations in the same environment. If the Discount Rate is high, that probably means the action has become a habit for the machine (or for the dog in the example above).
The overall goal of Reinforcement Learning is to optimize the policy to maximize future rewards. Following the example above, you could adjust the policy of telling the dog "Sit!" by changing your tone, volume, or facial expressions to identify how the agent (the dog) best responds.
Model-free RL vs. Model-based RL
Summary
Model-free RL uses its direct experience with the environment to adjust its policy and reinforce its actions.
Model-based RL uses a model to create a simulated environment to set a baseline for its actions within the environment.
The State Space is a two-dimensional model that represents all possible combination of actions the agent could take to achieve the desired outcome. With Model-free RL, the agent takes random actions until it finds the most efficient way to achieve the desired outcome, whereas with Model-based RL, the agent can simulate a wider range of actions to identify potential "cost-savings" in its journey to the desired outcome.
Model-free RL in a nutshell:
Comparably straight-forward to implement (No models or simulations to setup)
Uses the accurate environment instead of models
Model-based RL in a nutshell:
Sample efficiency (it can learn from a sample environment before going into the real environment)
Lower chance of damaging hardware (for physical models such as self-driving cars)